Reliable and Cost-Effective PoS-Tagging

نویسندگان

  • Yu-Fang Tsai
  • Keh-Jiann Chen
چکیده

In order to achieve fast and high quality Part-of-speech (PoS) tagging, algorithms should be high accuracy and require less manually proofreading. To evaluate a tagging system, we proposed a new criterion of reliability, which is a kind of cost-effective criterion, instead of the conventional criterion of accuracy. The most cost-effective tagging algorithm is judged according to amount of manual editing and achieved final accuracy. The reliability of a tagging algorithm is defined to be the estimated best accuracy of the tagging under a fixed amount of proofreading. We compared the tagging accuracies and reliabilities among different tagging algorithms, such as Markov bi-gram model, Bayesian classifier, and context-rule classifier. According to our experiments, for the best cost-effective tagging algorithm, in average, 20% of samples of ambivalence words need to be rechecked to achieve an estimated final accuracy of 99%. The tradeoffs between amount of proofreading and final accuracy for different algorithms are also compared. It concludes that an algorithm with highest accuracy may not always be the most reliable algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Unsupervised Part-Of-Speech Tagging Supporting Supervised Methods

This paper investigates the utility of an unsupervised partof-speech (PoS) system in a task oriented way. We use PoS labels as features for different supervised NLP tasks: Word Sense Disambiguation, Named Entity Recognition and Chunking. Further we explore, how much supervised tagging can gain from unsupervised tagging. A comparative evaluation between variants of systems using standard PoS, un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2003